STATISTICAL  ANALYSIS  OF  NON-STATKMANY 
StSISS  OF  SVMTS  IN  A OATA  SASX  SYSTSN 

by 

F.  A.  N.  Lewis 
4UMR 

C.  S.  Sbodlwr 
Scptsftbsr  XF7S 


Ijpfttwei  fsr  jwfeiic  rsisssst  4istriWtion  ntlirtcs* 
Frsp^sS  Istrs 


ClAsf  sf  M**ei  iMMSKb 


XAVAL  PO8TCRAD0ATI  SCHOOL 
Monterey,  California 


Kaar  Adnlral  I than  Lindsr  Jack  T.  lor at log 

Superintendent  Provost 


Tha  work  reported  herein  wee  supported  by  fu 
the  Chief  of  Deval  Research  under  Grant 


a provided  directly  iron 

2 

Reproduction  of  all  or  part  of  this  report  is  authorised. 


Prepared  by: 

i i?& ^ 

Peter  A.  V.  iWa,  Profeeeor 
Department  of  Operationa  Reeearch 


g . v«j  ?.  fL, 

Gerald  S.  SbedlcF 
IRH  Reeearch  Laboratory 


Reviewed  by: 


Depertnent  of  Operationa  Reeearch 


Relaaeed  by: 


Robert  Fcaeua 
Dean  of  Research 


Best 

Available 

Copy 


UNCLASSIFIED 


MCumtv  ciMttricArtow  or  **)•  »a«c  mm  On*  ammo 


wmm  oocunenr aim  pagi 


■MM  m alO*  ||  k*****mk  Am  A^*4  ammAmI 

wv*rw  | wtww  n wwpoowoy  wm  *r  ” 

Bata  lass  Systaac  stochastic  aodtls 

workload  data  bass  wanagaaest 

cawputrt  ayatana  probability  oodallins 

workload  cbaractorisatioo  aon-atatlonary  stochastic  point  procassas 


tral  problsaw  in  tha  parforaanca  evaluation  of  cowputar  syfitaaa 
ara  tks  description  of  the  brawler  of  tha  syttea  and  character!* at ion  of 
tha  workload.  Oaa  approach  to  thass  probloas  cowpriass  tha  Interettive 
ccwbiaatioa  of  data-aaalytic  procedures  with  probability  aodeliiatf  Title 
paper  describes  methods,  both  old  and  a«y»  for  tha  afcHistical  analysis  of 
ws*«*t*fcfeaary  taivstiata  stochastic  point  processes  and  estgue acae  of  pnai- 
tire  rawdsai  vat^les.  Such  processes  are  frs$iaatly  encountered  is  con- 


of  the  itNhutie  polat  procooo  of  trooeortlaBo  laltlataf  to  • writ  data 
Wn  ayatao. 

Ola  tha  baaia  of  the  statist  leal  aaaljviU,  a poo  haMOtanaooe  Foioooo  fnzm 
mM  for  the  transect loo  initiation  procooo  la  poataletod  for  porlodo  of  high 
eyotsa  activity  ami  found  to  b«  an  edofoota  characterisation  of  tha  data.  For 
porlodo  of  lowor  oyotaa  activity,  tha  traaoaction  lmltiatioa  procooo  baa  a 
cenploa  atnactaro,  with  aero  doctor  inf  aviioat . Ovarali  nodelo  of  t|U  typo 
bava  applloatioo  to  tho  validation  of  propooad  data  baoa  (aub)eyotatiiodola. 

1 


19.  (coot  M) 

bob  twoiaaaouo  Poiaooo  procooo 

doctor  orocooo 
soriaa  of  ovooto 
spectral  analysis 
rate  Function  oat last loo 
•alf-cscltiof  procoooos 
corrolatioo 
coborooca 


tacusitmo 


STATISTICAL  ANALYSIS  OF  RDM-STATIONARY 


SERIES  OF  EVERTS  IR  A DATA  IASI  SYSTEM 


P.  A.  W.  LwU* 

Department  of  Operations  Research 
and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  California  939A0 

G.  S.  Shadier 
IBM  Raaaarch  Laboratory 
San  Joae,  California  95193 


AM  TRACT:  Central  problem  in  the  performance  evaluation  of  computer  ayatem 

ere  the  description  of  the  behavior  of  the  system  and  characterization  of  the 
workload.  One  approach  to  theea  problem  conprlses  the  interactive  combination 
of  data-enalytic  procedures  with  probability  aedelling.  This  paper  describes 
methods,  both  old  and  new,  for  the  statistical  analysis  of  non-stat  lottery 
univariate  stochastic  point  processes  and  sequences  of  positive  random  variables. 
Such  processes  are  frequently  encountered  in  computer  system.  As  an  illustra- 
tion of  the  mthodology  an  analysis  is  given  of  the  stochastic  point  process  of 
transactions  initiated  in  a running  data  base  systea. 

On  the  basis  of  the  statistical  analysis,  a non-homogen so us  Poisson  process 
aodwl  for  the  transaction  Initiation  proceas  is  postulated  for  periods  of  high 
systea  activity  and  found  to  be  an  adequate  characterisation  of  the  data.  Far 
periods  of  lower  systea  activity,  the  transaction  initiation  process  h«s  a cou- 
ple* structure,  with  sore  clustering  evident.  Overall  models  of  this  type  have 
application  to  the  validation  of  prspased  data  base  (sub)syotm  models. 


‘brittle  while  this  author  was  a consultant  to  IBM  Research.  Support  from  the 

Office  ot  Naval  Research  under  Grant  la  gratefully  acknowledged. 

^ 


1.  Introduction 

Description  of  the  behavior  of  a running  ay a ten  and  characterisation  of  the 
workload  are  central  problems  in  the  performance  evaluation  of  data  base  ay a tens. 

These  are  ayatans  in  which  there  are  many  users  who  can  access,  via  remote  ter- 
minals, a (typically  very  large)  data  base  aanaged  by  a computer.  Such  a system 
should  respond  to  a query  in  a reasonably  short  time,  given  the  number  of  ue*ra 
and  the  nature  of  the  user  environment.  This  must  be  accomplished  as  economi- 
cally as  possible,  where  by  economically  we  include  direct  customer  (waiting)  i 

costs  and  computer  system  resource  utillsatif/n.  This  is  a typical  operations 
research  situation  in  which  we  are  trying  to  allocate  limited  resources  in  an 
optimal  way  amongst  competing  demands.  Because  of  the  complexity  of  data  base 
systems,  detailed  msaauramants  of  existing  systems  are  needed  in  order  to  model 
and  evaluate  then;  such  measurements  comprise  just  one  aspect  of  performance 
evaluation,  which  la  its  entirety  would  encompass  data  collection,  analysis, 
modelling,  and  Interpretation.  Ultimata  goals  of  performance  evaluation  include 
tuning  of  existing  ayatena  and  prediction  of  performance  of  proposed  systems. 

This  paper  is  concerned  with  methods  for  statistical  analysis  of  ssrlss  of 
events  which  can  be  applied  to  obtmlm  a graphical  and  mathematical  description  of 
the  bsfrsvlot  of  a ranking  data  base  system,  luch  a description  would  be  a useful 
starting  point  for  stadias  aimed  at  workload  characterisation.  The  particular 
analysis  of  data  given  uses  a combination  of  statistical  date-analytic  procedures 
and  profctbllity  modelling  (cf.  Lewis  and  Shadier,  1S73).  The  specific  results 
reported  harm  for  the  analysis  of  a son-stationary  univariate  sariaa  of  events 
occurring  la  an  IMS  data  base  system  are  intended  neither  to  comprise  in  them- 
selves n description  of  the  running  IMS  system  ujr  necessarily  to  be  a sufficient 
basis  for  characterising  the  workload  of  an  IMS  system.  Rather  the  results  are 
to  ba  considered  illustrative  of  methods  that  may  be  useful  in  such  studtes. 

In  a data  bass  system  the  workload  may  be  taken  to  bs  s collection  of  data 
sequences  identifiable  at  various  levels  of  the  system;  workload  characterisation 
coop rises  the  study  of  those  data  sequences  (Individually  and  Jr  .tly)  along  with 
the  transformations  among  them.  We  are  deliberately  vague  here  Vout  what  la 
meant  by  data  sequence;  it  could  bo  a sequence  of  events  occ-  in  time,  i.a. 
a point  process,  or  a sequence  of  observations  of  a stochastic  rocsss,  l.s.  s 
time  eerie*.  For  sxaapla,  la  an  IMS  data  bass  system  we  can  consider,  at  the 
user  level,  sequences  of  traaeactione  and  DL/1  calls;  at  the  logical  level, 
sequences  of  target  segments;  at  the  segments  starched  level,  sequences  of  path 
segments;  at  tbs  paging  leval,  sequences  of  path  blocks,  etc.  Associated  with 
these  identified  basic  workload  data  sequences,  there  may  be  other  data  sequence* 
of  interest,  e.g.,  tha  subsequence  of  psth  block  exceptions.  We  may  also  be 


I 


latMMt*4  in  (itcml  MMarwMti  rilttW  to  eh*  workload  data  sequences  such 
as  response  times  for  uaara. 

Civen  th«  complexity  of  data  haaa  systems  and  tbs  result leg  relative  diffi- 
culty of  carrying  out  aaaaiagful  performance  evaluations  end  designs  for  such 
systems,  the  collection  and  analysis  of  aseeursnsot  data  fro*  representative 
system  to  identify  and  characterise  eignif leant  perforaance  phenoaena  aeeaa 
appropriate.  The  aveilahilitv  of  such  aaaeureaente  presents  the  poaeibility  of 
obtaining  thereby  eapirically  valid,  paraaeteriaed  not beast  leal  aodels  for  work- 
load data  sequences.  However,  the  sheer  voluae  of  data  which  can  be  collected 
fro*  a ruouing  data  jese  systea  (e.g.  tens  of  thousands  of  transactions  per  day, 
hundreds  of  thousands  of  DL/I  calls  per  day,  mill ions  of  path  segnents  per  day, 
etc.)  la  a source  of  so*e  difficulties.  Such  a voluae  of  data  is  not  only  costly 
to  nanipulata,  it  is  difficult  to  eoapmhend.  In  practice  it  appears  that  if  we 
wish  1:0  do  a detailed  analysis  (and  modelling)  of  any  of  the  several  workload 
data  sequences  neat lotted  above,  it  la  neesaeary  to  select  "representative" 
sequences  observed  during  (relatively)  short  periods  of  tine.  If  useful  lnfor- 
aetlon  is  to  bo  obtelnod  free  the  data  collection,  analysis,  and  nodtiling  (e.g. 
for  the  detendjkatiou  of  pertinent  system  requirements) , it  is  inportent  to  be 
able  to  describe  the  vysteu  content  In  which  the  trensectlan  workload  phenonena 
were  obaarvad  end  enelyeed. 

In  addition  to  models  of  the  workload,  nedele  of  the  oyetoa  or  aub-eystasj 

structure  are  needed  in  performance  evaluation.  The  authors  feel  that  stochastic 
node Is  of  the  type  obtained  in  this  study  have  application  to  the  detailing  of 
proposed  system  aodels , l.e.  filling  in  the  fine  structure  of  parts  of  the  nodal. 

A second  application  is  to  the  "validation"  of  eyston  models  in  the  sense  of 
establishing  their  predict iva  value,  the  nothods  used  for  the  statist leal 
analysis  of  data  fron  the  naming  systeai  can  also  be  used  to  analyse  the  output 
of  simulations  of  proposed  (suh)aystsn  asdels.  "Agreement*  of  a process  pre- 
dicted by  the  system  nodal  with  the  corresponding  process  observed  in  the  running 
system  would  constitute  evidence  oi'  the  predictive  value  of  the  model.  Thus,  for 
eaample,  the  results  of  the  statistical  analysis  of  the  transaction  Initiation 
process  reported  hers  could  be  used  In  attempting  to  validate  a atocbaatic  nodal 
of  the  Ml  ML/I  component  such  aa  tha  quaualng  medal  developed  by  Lavenberg  and 
thndler  (1973) . 

2.  Description  of  the  Available  Bata 

Tha  analysis  given  here  Illustrating  nethedc  for  the  examination  of  non- 
stationary  series  of  event*  is  of  data  obtained  fron  a a DC  data  asnagowsnt 

•ystan.  The  following  is  « brief  outline  ef  the  structure  of  IKS . * 

i 

t 

i 


1 


data 


INS  (I IN  Corf.,  1973)  la  a procaaalag  progron  for  tho  iaplaweatatioa  of  large 
boaoa  shared  la  coneon  by  amnl  applications.  The  IMS  program  executes 
— dor  tha  operating  ayataa  of  tha  teapots r ayataa  to  extend  tha  data  cot»unica- 
tloa  aad  data  baaa  aaaapensnt  capabilities  of  tha  operating  ayataa.  la  INS 
users  can  aceaaa  tha  data  baaa  froai  ranote  teralaels  by  aatariag  Messages  called 
traaaactloaa.  A particular  traaaactlon  uaaa  aad  thua  uniquely  idaatlflaa  an 
application  prograa  which  proca aaaa  tha  aeeeaga  (or  traaaactlon)  and  accaaaaa  the 
data  baaa.  Tha  data  uauagsaant  facility  of  INI  la  called  Data  Language/I  (DL/X). 
Tha  two  latarfaeaa  of  as  application  prograa  with  DL/I  ara  a data  baaa  descrlp- 
tloa  aad  a prograa  llakif*  which  allowa  DL/I  to  proca aa  data  baaa  aceaaa  requests 
which  arlaa  during  aaacutloa  of  aa  appl lent loo  prograa.  The  execution  of  on 
application  progrra  thua  gives  riaa  to  a sequence  of  calls  to  tha  DL/I  coaponant 
of  OS. 

A conceptual  diagraa  of  a coaputer  ayataa  running  DC  la  given  in  Figure  1. 

Aa  ahowa  there,  a portion  of  naaory  la  devoted  to  tha  operating  ayataa.  Tha  IMS 
prograa  occupies  a portion  of  naaory  called  tha  IMS  control  region.  Application 
prograae  raaida  in  secondary  storage  in  aa  application  prograa  library.  For 
enacution  an  application  prograa  aunt  be  loaded  into  ana  of  several  (typically 
three  or  four)  regions  in  naaory  called  DC  application  regions.  The  data  base 
res idea  in  secondary  storage,  aad  data  are  transferred  into  naaory  for  processing 
in  response  to  transaction  initiations. 

Data  ea  tha  processing  of  transaction*  hat j been  obtained  from  a coaputer 
ayataa  rune tag  DS  for  production  control  under  the  IM  operating  ayataa  OS. 

In try  of  data  into  tha  ayataa  is  on-line  aad  is  governed  by  the  occurrence  of 
events  an  tha  production  line.  Tha  epochs  of  tins  at  which  individual  DL/I  calls 
were  coapleted  (l.a.  control  returned  to  the  application  prograa)  have  bean 
recorded,  alpag  with  in f onset ioc  sufficient  to  identify  the  epochs  of  tiaa  at 
which  individual  transactions  were  initiated.  Froa  these  tiaa  staapa  the 
sequel  ' of  tines  between  transaction  initiations  was  derived.  Moat  of  the 
resqlts  displayed  in  this  paper  era  for  a tiaa  period  of  high  ayataa  activity 
referred  to  as  tiaa  period  8.  This  data  consisted  of  1999  transaction  initia- 
tions in  a period  of  tiaa  (in  unspecified  units)  of  tQ  » 11936.6066.  Much  of 
tbs  statistical  analysis  was  dons  using  tha  &ABI-IV  prograa  (Lewis,  Katchar  and 
Hals,  1969)  for  analysing  aeries  of  events.  SASE-IV  baa  a aaxluua  input  of  1999 
events:  this  accounts  for  the  length  of  the  period  under  study.  This  high  ayataa 
activity  period  was  sal sc tad  after  an  initial  overall  look  at  tha  several  days  of 
data  on  transaction  initiations  which  was  available.  Tha  analysis  also  used 
IABI-VZ  (an  inprovsd  version  of  SASWF)»AFi  inplsnsntaeions  of  parts  of  SASK-VI , 
and  APL  lnpienantctlens  of  rata  estimation  procedures. 


* 


i.  PnllalMry  tatlysia  of  Trmietiw  Initiation  Process 

3.1  Prior  CcMiimtlsM  and  Aaswupt loos 

la  analysing  the  transact loo  iaitlatloo  data,  than  were  a nuaber  of  prior 
assunptions  which  could  bo  aado  about  tha  data  to  atm  aa  a starting  point  for 
tha  analysis.  Tbo  purpoaa  of  tha  data  analysis  is  to  coo f Ira  those  assuaptlons 
or  to  point  to  suitable  aodiflcatioaa. 

(1)  Sisco  the  data  la  takaa  ovor  u whole  day  (in  fact,  six  whole  days),  we 
aspect  a tine  of  day  offset  aa  activity  builds  up  through  the  working 
day  and  then  declines  during  tha  evening.  Thus,  any  kind  of  initial 
analyses  based  on  aa  assunption  qf  stationarity  is  inappropriate. 

(2)  Since  the  data  consists  of  tinea  of  transaction  initiations,  so  that  we 
are  dealing  with  a point  process  or  aeries  of  events,  the  usual  null 
nodal  which  is  delineated  la  Section  5 I;  a aon-honogsaeous  Poisson 
process  (WPP) . This  could  be  appropriate  here  since  the  transaction 
process  is  a superposition  (Cos  and  Lewis,  1966,  Ch.  8;  Clnlar,  1972)  of 
inputs  fron  a masher  of  sources  (users). 

(3)  Since  each  user's  activity  is  likely  to  consist  of  a (randon)  nuaber  of 
transactions  aftar  initial  sign  on,  sons  clustering  in  the  data  night  be 
axpectad.  da  appropriate  nodal  hare  la  tha  non-honoganeous  Poisson 
cluster  prccane  (Lewis,  1967).  In  this  procaoa  an  initial  prinsry 

(ns  in)  evaat  panamas  a finite  sage  sues  of  secondary  (subsidiary) 
events;  the  conpiete  process  is  than  tbs  superposition  of  the  prlaery 
and  secondary  events,  where  the  as la  events  are  assured  to  be  generated 
by  a noo-houpgaaaous  Poisson  pro coos.  If  enough  initial  events  are  gen- 
erated (high-activity)  so  that  the  nuaber  of  secondary  processes  is 
large,  this  process  la  herd  to  distinguish  fron  a Poisson  process. 

Starting  frea  thane  aaewnptiena,  the  analysis  of  tha  data  proceeded  as 

follows: 

(a)  A vary  rough,  node 1- free  procedure  was  used  to  est lasts  the  rate  func- 
tion for  the  transaction  initiation  process  ever  the  who's  day,  the  rate 
function  being  the  derivative  of  the  eapected  nanher  of  transactions  in 
a tins  period  (0,t  ] . This  me  weald  bn  cenetant  for  a stationary 

(beaopsaaous)  process. 

(b)  On  the  basis  of  this  trend  analysis,  relatively  hone  gaseous  high  sad  low- 
activity  periods  ware  soloetod,  sod  an  attaapt  was  aado  to  verify  the 
NUT  adi’4«r  tbs  clustering  ns  dal,  fax  tha  trsMaetiaa  initiation 

iron  i— 


1 

i 

(c)  Baud  on  this  local  analysis  and  nodelling  of  the  transaction  initiation 
process,  sore  formal  model-dependent  est last ion  procedures  were  applied 
to  the  transaction  rate  function  for  the  several  days.  In  later  sec** 
tions  it  will  be  seen  that  the  Poisson  assumption  is  reasonably  valid 
for  high-activity  periods,  clustering  becoaes  aore  evident  at  low- 
activity  periods,  and  there  is  a surprising  amount  of  local  lnhoaogene- 
ity  of  *n  almost  oscillatory  (cyclic)  nature.  It  is  this  last  phenome- 
non which  is  perhaps  the  aost  Interesting  aspect  of  the  analysis. 

3.2  Analysis  of  Transaction  Initiation  Counting  Process 

Point  processes  can  be  analyzed  either  in  teras  of  the  Intervals  between 
events,  which  is  a stochastic  sequence  (time  series),  or  the  counting  process 
(the  number  of  events  in  an  Interval  (0,tj)  which,  as  a function  of  t,  is  a 
continuous  paraaater  stochastic  process.  Here  0 is  some  convenient  fixed 
origin,  the  number  of  events  in  (0,t]  is  denoted  by  N^,  and  the  expected 
value  of  is 

M(t)  - E(Nt>.  (3.1) 

Its  derivative,  often  called  the  rate  function  or  intensity  function,  is 

■(t)  - “ A(t), 

Che  notation  A(t)  being  generally  used  for  the  rate  function  of  a Poisson  pro- 
cess. (See  Cox  and  Lewis,  1966,  Ch.  4,  for  further  definitions  of  point 
processes.) 

Hots  that  although  the  times  of  the  transaction  initiation  events  for  the 
six  days  were  available,  for  an  initial  analysis  we  used  counts  of  events  in 
successive  unit  time  Intervals,  l.e.  A « 1.  This  constitutes  a sampling  of  the 
data;  if  the  data  were  from  a HHPP,  these  counts  would  be  independent  Poisson 
variates  with  possibly  different  means  (»«?  Section  4).  Let  these  counts  be 
®j»  J • l,...,n,  where  n^  • and  ■ 0.  If  these  counts  are  summed 

to  give  counts  in  C contiguous  intervals,  they  will  still  be  Poisson  distrib- 
uted. Such  a summation  can  bn  consldarad  as 

(1)  a crude  smoothing  of  the  date  to  obtain  an  estimate  and  picture  of  the 
rate  function  ever  thr  day.  Thus,  since  A « 1, 

4m(§)  ■ l • <*(§); 

the  weights  la  the  sat  ./thing  all  have  value  1/C.  This  constant  smooth- 
ing function  must  be  used  with  cere;  it  can  causa  spurious  affects  if 
the  rate  is  wet  chengiag  linearly. 

(2)  * eeaieeciag  of  count  date  to  test  for  homogeneity. 


* 


*"i*  ,4.<--*'»Mr- 


r 

t 

} 

» 

/ 

<• 


i 5 


Plots  of  the  smoothed  oonts  using  <■  * 4800  .ire  shown  In  Figure  - tor  1 of 
the  6 days,  and  for  the  average  of  the  saoothed  rnunte  over  all  6 days.  Formal 
t<?sta  for  homogeneity  are  available  for  Poisaon  variants  (Cox  and  iavU,  Ch.  4), 
or  also  a one-way  analysis  of  varlanca  can  be  performed  on  the  coalesce  data 
after  a square  root  transformation.  The  analysta  of  viriance  test  is  tad  be- 
cause the  counts  are  large  enough  to  be  considered  to  h>  normally  distributed; 
the  square  root  transformation  is  used  because  although  Poisson  counts  with  * 
large  mean  are  approximately  normally  distributed  (see  Table  2.1.  Cox  and  Lewis, 
1966,  p.  21)  the  mean  and  the  variance  are  the  same,  and  this  violates  a basic 
assumption  in  the  analvsis  of  variance  tesf.  The  square  root  «•(  a Poisstf.  vari- 
ate N plus  one-fourth,  «^H-1  / 4 , has  mean  approximately  equal  to  • . and 
variance  1/4,  where  u is  the  Poisson  moan  (Cox  and  Lewis,  1966,  p.  44  j. 

The  snslvt  is  of  .selected  time  periods  reported  below  is  for  periods  chosen 

from  day  2.  in  Table  1 we  show  in  successive  columns  the  number  of  counts 

(transaction  initiations)  In  successive  groups  of  forty  120  time  unit  periods; 

the  mean  number  of  counts  in  1 time  unit  (the  rate  function  estimate  plotted  in 

Figure  2)  for  day  2;  x the  average  of  forty  quantities  x , where  x * 

L »J  j/2  . 

{(number  of  ccmts  in  Jth  120  time  unit  period  in  group  1)4- 1/4;  ; -/  and 

0j,  the  within  group  sample  variance  and  standard  deviation  respective!). 

Firstly,  it  esn  be  seen  that  all  of  the  variances  are  larger  than  the 

value  1/4  postulated  on  the  basis  of  a homogeneous  Poisson  count  process;  since 
39  - oJ/fl/4)  "156  * o*  should,  under  the  null  hypothesis,  have  a distribu- 

tion with  upper  992  point  of  62.281,  all  the  9**0  are  significantly  large  (i.e. 
greater  than  62.281/156  - 0.3992)  and  either  the  Poisson  or  homogeneity  (within 
group)  assumptions  are  invalid. 

Comparing  the  sum  of  the  within  group  sampla  variances  c?,  which  is 
48.1826,  to  the  betreen  group  vnrlences  (or  sample  variance  of  the  x^),  which 
has  a value  1.7126  we  get  an  F-ratlo  of  19.4878.  The  F-ratio,  formally  given  by 

a ai 

F » — 

I S{/k 

has  an  F-distribution  with  - (»-i)  * k • 39*  12,  - k-1  • 11  degrees  of 

freedom,  and  the  value  19.4878  in  Table  1 ia  highly  significant  at  s 52  level  or 
at  e 1*  level.  We  conclude  that  the  date  is  inhoaogeneoue,  although  departure 
from  * i-^isaon  assumption  has  not  been  ruled  out. 

The  overall  picture  in  Figure  2 is  of  an  initial  build  up  in  transaction  rate, 
a fairly  constant  transaction  rate  for  » period  of  time,  and  then  a drop  to  a 
lower  level.  This  picture  is  consistent  over  days;  the  drop  in  day  1 (around 
t * 165888)  was  due  to  a yeriod  for  which  data  waa  not  available. 


flowvcr,  even  in  Ch«  two  relatively  stable  periods,  there  Is  some  evidence 
(large  values  of  o'  in  Table  1 relative  to  1/4)  of  wore  microscopic  inhomogene- 
lty,  am d tbe  aaalysla  proceeded  by  examining  sections  of  data  in  these  high  and 
low-activity  periods  la  more  detail.  The  examination  was  of  Interest  j..r  se,  but 
ana  also  Motivated  by  a need  for  aore  forasl  statistical  rate  estimation  proce- 
dures. 

Highly  paraaetrlc  global  procedures  for  rate  estimation  are  available  at 
pres eat  ~nly  for  WPP*e.  Details  of  tbe  procedure  and  the  estimation  are  given 
la  tbe  meat  two  sections.  Application  to  the  data  for  the  high  and  low  system 
activity  periods  aad  for  the  entire  day  la  described  in  later  sections. 

to  addition,  non- paraaetrlc  local  smoothing  procedures  related  to  kemal-type 
density  estimates  (looenblatt,  1956)  are  used.  These  are  also  described  later 
(Section  5.2).  First  m give  properties  of  the  HHPP, 

4.  Won  Homogeneous  Poisson  Process  Model 

Tbe  aon-hesmgsnaoue  Poisson  process  nodal  for  a series  of  events  N(  is  dis- 
cussed in  a statistical  contest  by  Cox  and  Levis  (1966,  Ch.  3).,  Lewis  (1972), 

Cox  (1972),  and  Irani  (1972).  A very  detailed  astheaaticsl  account  is  given  in 
Chadaahn  and  Kovalenko  (1969);  a recent  treatment  is  by  tinlar  (1975).  Like  the 
hnmnganswus  Poisson  process,  tbe  non-homo geneous  Poisson  process  arises  as  a 
limit  of  tbe  superposition  of  a large  number  of  non-statlonary  point  processes 
(cf . (lular,  1972).  Ike  assumptions  underlying  the  non-honogeneous  or  time- 
dependent  Felon on  process  (HbPP)  are  tbs  same  as  those  for  the  ordinary  Poisson 
process  incept  that  tbs  rata  paraamtar  X lo  now  considered  to  be  a continuous 
(section  of  tine  A(t).  One  approach  to  the  HHPP  is  via  the  incremental  proba- 
bilities In  small  intervals.  Thus,  for  s.tkO,  and  denoting  by  N(s;t)  the 
number  of  ovonto  In  tbe  process  la  tbe  Interval  (t,t+s),  the  assumptions  for  a 
WPP  With  rate  function  A(t)  are  that,  as  s ♦ 0, 

Pr(M(e{t)-0}  ■ 1 - X(t)e  + o(e), 

Pr(R(e;t)  • 1)  - A(t)a  + o(s),  (4.1) 

mad  that  the  random  variable  N(s;t)  la  statistically  independent  of  the  number 
end  position  of  avtsts  in  (0,tj.  As  « consequence  of  (4.1), 

Pr{H(s;t) * 2}  • o(s). 

Tbe  survivor  function  for  tbe  forward  recurrence  time  la  the  process,  the  proba- 
bility tbot  there  are  no  events  ia  (t,t+s],  i.e.  that  H(s;t)  - 0,  is  derived 

via  first -order  differential  equations  to  be 

t*9 

l(s{t)  • exp  | - | >(u)*3u 

t 

i 


(4.2) 


A more  general  approach  te  defining  the  MVP  ararts  with  the  function  A(t), 
which  la  aaauaed  to  be  monotone  non-decreasing  and  continuous  from  the  right; 
then  the  number  of  events  occurring  In  any  interval,  nay  (t,t+s],  is  assumed 

to  have  a Poisson  distribution  with  parameter 

t+s 

A(«m)-A(t)-J  A(u)du, 

t 

i.e.  for  k - 0,1,2,... 

-{A(t4n)-A(t)> 

Pr{M(a;t)  * k}  - - j-ULSHM.W  . 

Consequently,  A(t)  is  the  expected  value  function  M(t)  discussed  in  Section 
3.  In  addition,  the  number  of  events  in  any  finite  set  of  non-overlapping  inter- 
vals are  assumed  to  be  independent  random  variables.  There  are  ether  equivalent 
definitions,  and  also  minimal  definitions;  see  Gnedenko  and  Kovalenko  (1969)  and 
Cinlar  (1975). 

The  following  theorem  (cf.  £inlar,  1975)  establishes  that  a homogeneous 
Poisson  process  of  rate  1 can  be  obtained  by  transformation  of  the  time  scale  of 
a MHPP,  via  the  inverse  of  A(t).  This  result.  Theorem  4.1*  and  the  following 
Theorem  4.2  arc  the  basis  for  the  procedures  described  in  Sections  6 and  7 below 
for  detrending  the  data  end  teat leg  the  goodneea-of-flt  of  the  MHPP  nodal. 

JfcVfflftAl*  A(t)  be  a nan"  decreasing  right  santAnmema  function  of  t * 0. 

VT2 m thm  fh—  m eesnte  is  e wm  with  B{«t)  - A(t),  if  end 

only  if  Tj  • A(Tj) ,T*  • l(T2)t...  are  the  ft—  in  ssania  Ip  a Imepmasw 
Peieeea  pracaae  with  rata  1. 

The  next  theorem  eatehliahee  an  import  met  property  of  the  MVP  which  wa  use 
throughout  the  paper. 

Theorem  4.2.  Assume  we  have  a MVP  obeerved  far  a fixed  time  (0,tQ),  in  which 
Mt  - m events  occur  at  tinea  Tj  « »2  « . < tg.  Th*°* c0041*1®0*1  00  having 
obServmd  n(>0)  events  in  the  the  T^e  era  distributed  as  tbs  order 

statistics  from  a sample  with  diet rlbet how  fmectlen 

°*“V 

and  wham  A(t)  la  abss lately  cat  law  we,  pwmheblllty  demelty  function 

1(0  * * #<uv 

Thus,  we  see  that  (cemdltleaally)  the  trams  format  jam  of  the  time  amis  is 
exactly  the  seme  ea  the  probability  Integral  trams fora  which  is  used  to  transform 
a random  variable  X with  known  distribution  function  F<x>  into  a uniform 
random  variable  cm  (0,1),  i.e.  a * VUl  is  mellow  (0,1).  This  traasfmmt ion 


9 


ia  the  (Mill  for  nan-parametric  testa  of  distribution  functions  such  as  tha 
Kolasgorov-Smlrmov  tast.  Tha  analogy  explains  why  taste  for  a honogeneous 
Poisson  process  (Iff)  are  Similar  to  teats  for  coaplataly  specified  distributions 
obtained  from  independent,  identically  distributed  saaplas;  the  primary  differ- 
ence in  the  too  procedures  lies  in  the  alternative  hypotheses  which  arise  (see 
Cox  and  lewis,  1966,  Ch.  6).  Specifically,  if  we  teat  that  a random  saaple 

with  unknown  distribution  function  F(x)  is  froa  a given  distri- 
bution function  F0(x),  then  if  FQ(x)  i F(x; , the  variables  - FQ(X, ),..., 

U • Fn(X  ) are  1.1. d. , but  not  uniformly  distributed.  However,  if  we  test 
n u o 

(conditionally)  that  n observed  tlass-to-events  T, «...  T are  froa  a HHFP 

i n 

with  given  Integra ced  rate  function  A^(t) , then 

(1)  if  the  pr>ceta  is  R8FF  but  Ag(t)  is  not  equal  to  the  true  integrated 
rate  function  A(t),  then  Tj^  • Aq^),...,^  • AQ(Tn)  are  l.l.d., 
but  not  unifora  (0,^1 . 

and 

(2)  if  the  process  is  not  MHFF.  then  even  if  Afl(t)  is  equal  to  A(t), 
the  T**s,  1 • l,...,n  are  not  conditionally  a random  saaple. 

The  above  lends  to  vary  different  considerations  In  the  power  of  tests  for 
MPF's  and  completely  specified  distributions,  even  though  the  test  statistics 
are  the  seen  (see  Levis,  1963,  for  greeter  detail).  It  is  difficult  in  testing 
for  MPF's  with  procedures  based  on  the  above  theorems,  to  separate  out  the 
effects  of  depertnrss  from  Poisson  assumptions  and  departures  froa  aasuaptidns  as 
to  the  fora  of  A(t).  However,  since  both  HFF's  and  MPF's  have  Independent 
const  la freeware,  teats  for  the  global  Poisson  assumption  are  based  on  this  prop- 
erty. In  particular,  the  spectrum  of  counts  (Cox  and  Lewis,  1966,  Ch.  3)  should 
be  flat  after  detrending. 

la  the  following  section  we  discuss  estimation  of  the  MPP  rate  .von 
•sing  ps feme trie  models,  both  to  describe  la  a global  way  the  rate  function  (as 
opposed  to  tha  local  smoothing  is  Fife.'**  1)  and  to  detrend  the  date  so  ns  to 
sxssdns  the  global  Poisson  assumption.  Hoa-parametrlc  rate  estimation  is  also 
briefly  discussed. 

3.  1st  last  ion  of  the  MPP  late  Function 

3.1  Parametric  Model  and  late  Is timet ion 

Following  Cos  and  Lewis  (1966,  Ch.  1)  and  Cox  (1972),  an  exponential  poly- 
nomial rate  function  has  bean  assumed  for  tha  MPP,  i.e.  > ( t ) of  the  form 

r 

A(t)  • exp(  l “*<  *0*®i (5-1) 

arO 

This  assumption  is  convenient  and  constitutes  no  real  restriction  since  any 


come Ineoua  rata  function  cm  to  apfrntotd  arbitrarily  closely  by  an 
nVWMtial  pelyaea&al.  Tto  result  follow  fro*  r multi  oo  ordinary  polynoadals 
by  taking  logarithas;  not*  ttot  X(t)*0  for  any  values  of  . . . ,ay.  We 

daacrito  now  atatlatlcal  proc» Auras  baaod  oa  this  nodal.  Fomal  tsats  for  tha 
degree  r of  an  oapoaoetlal  poly  oat  a 1 rat*  function  ara  dlacuaoad  in  Soctlon  6. 
Sara  a procedure  la  ontllnad  for  tin  a* at—  lltolltood  oat last loa  of  tto  cooffi- 


c lanes  {o^}  of  an  exponential  polysoU'.  of  flood  degree  r. 


Tto  tlnaa-ta-avonts  T,  < T,  ...  <T  la  a flood  tins  period  and  tto  randon 

12  n 

FawtlM 
X (u)du 


▼arlabla  V(tn)  • n have  a Joint  dona i tv  function  (Coo  and  Lewis,  1W6,  Ch.  3) 

0 fl0 


f(t1»...»t  to)  • • 


-L 


a x<tf), 

i*i  1 


(5.2) 


which,  on  substituting  the  rat*  function  (5.1),  bocowa 

t 


f (tj , . . . ,t^jn)  ■ a 


r f 0 r , 

“I  V**  •***  I V }dt»  (5.3) 

a»fl  * ■ Jo  1-0 


wborv 


n • 0,...,r. 


(5.M 


Thus,  tto  log-1 iaalitood  function,  log  L,  tto  logarltbn  of  tto  doaalty  at 
tto  o baa  road  oalueo  of  tto  randon  onriabla*  canaidorod  as  a function  of  tha  r+1 
par aan  tors,  la 


r r 

log  l((L,a  ) - l no-  txp(  l «,t  )dt . (5.5) 

w 4 T in  bo 


It  follows  ttot  tto  dor 1 vat loos, 

t 


0 4-0 

aa  tto  scoras,  ar* 


" * %’  j **  •*9<  l k * °»1 


*°k  * ;0  noO 

Tto  solution  {a  } to  tto  syoten  of  Bgo*  (5.4),  tto  score  vector, when  sot  to 


(5.4) 


lllttl 


'*  of  (a^),  end  can  to  dotamtaad 


aero,  ar*  tto 

awnarlcally  by  iawtan  toptosa  ltaratlon.  Tto  owaorlcal  proendara  works  well  pro- 
vided that  an  Initial  vector  sufficiently  naar  tto  so  lot  law  la  know.  A two-stop 
nathod  for  obtaining  such  an  Initial  vain*  baa  boon  proponed  by  HacLaan  (1974). 
ils  procedure  conalato  of  finding  an  ordinary  poly  awl  a i rapraaantatlan  of  tto 
sans  degree  aa  X(t)  having  tto  abaaroad  anna  of  p swore  (s^)  for  ltaMwaosota.M 
An  espenaotial  poly  awl  al  appraaloatlaa  to  thin  polyonAal,  obtained  by  taking 
legal  Igbw  and  agela  fitting  iamts,  aervna  as  tto  Initial  vales  for  tto  towtun- 
itaratian.  This  Marlaai  prooodura  toa  bean  Inplaaantad  la  sad  used 


3 f 


co  eetinate  Cbo  coefficients  ia^}.  The  procoduro  appears  to  work  wo  11  for 
polynomials  up  Co  degree  8.  Katina tee  of  cbo  covariance  matrix  ' cho 
likelihood  ootlaatoo  {o^}  oro  obtained  froo  cbo  oocotMl  or dor  partial  deriva- 
tives of  tba  log-likelihood  equation  whan  evaluated  at  the  aatlaatea  parameter 
values. 

Oaee  tba  appropriate  degree  of  tba  polyaonlal  la  obtained  by  the  net hods  of 
Section  6,  tba  rata  function  with  tba  maximum  llhellbood  aatinates  for  the  a’e 
can  be  plotted  to  obtain  a picture  of  tba  rata  function.  Tba  procedures  ere 
clearly  oenaitl-ie  to  the  MPP  nodal;  for  thia  reason,  we  diacuaa  next  noo-para- 
natric  kernel-type  estinates. 


5.2  Non-parenetric  Kernel-type  Rate  Katina tea 

Theoren  4.2,  trtilch  ralataa  (conditionally)  the  rate  function  x(t)  in  a RUPP 
to  a density  function  in  (0,tQ], 


f(r\  m i 

A(t0)-A<0)  * 


0'S  t s t 


0* 


suggests  we  could  use  noa-paraaetrlc  probability  doaslty  function  estinates  to 
eat lasts  rate  functions,  at  least  in  HRff'i.  The  procedure  choeen  is  tba  non- 
paranetrlc  kernel-type  density  estlnate  introduced  by  Rosenblatt  (19)8).  briefly, 

the  procedure  to  eetinate  f(t)  free  a randon  sanple  T,  ,T_ T is  as 

follows t 

Define 


*n(t)  " nb(n)  £x  "fan))  * 


dnvs  U(u)  is  a 


r 


,ee  iategrable  weight  function  with 
V(u)du  - 1, 


and  b(a)  is  a positive  bandwidth  functisn  which  tends  to  aero  as  but  is 

each  that  e(b(a))  • 1/a.  Thus,  wa  eight  have  b(a)  - n"1^2,  for  eaanple. 


■ate  that  for  a given  ant  nf 
m density  functions,  i.n. 


, all  oatlnatna  of  thia  fern  are 


f#(t)  k 8 


• r*. 


(w)du  • 1, 


and  since  the  T^'a  am 
atechaatic  proesna , but  clearly 
eetinate  dona  ant  require  p nr one trie 
bandwidth  function  and  hemal  W(u) 
already 


variahlna,  f^(t)  la  a 


Although  thia  type  of  deaelty 
to  bo  node  about  f(t),  the 


£a  this  paper  we  Have 

V(u)  te  be  * triaagwlor  function  and  b(n)  tn  bo  1,25/a^1. 


The  conditional  stmeture  of  the  WPP  MkM  the  estimation  of  the  rate 
function  A(t)  • taller  to  the  noa-paramstric  ••tiwtion  of  the  density  function, 
but  with  two  differences. 


First,  cere  Must  be  token  with  noraellsetion  of  the  rate  function  estimate. 
This  is  because  the  procedure  abort  estimates  the  rate  normalised  by  dividing  by 
A(tg)  - A(0)  and  \(tQ)  - A<0)  is  unknown.  For  a HHPP  this  is  the  mean  of  a 
Pels  eon  variable  which  is  estlmted  by  n,  the  number  of  events  in  (0,  tQ] . 
Using  this  estimate  foir  A ( t Q)  - A(0)  we  then  get,  as  a rate  function  estimate, 


This  will  be  nodal  about  the  usual  estimate  of  the  rate  X in  a homogeneous 
Poisson  process,  which  is  estimated  by  X * n/tg.  The  second  difference  la  that 
when  the  density  function  sstiastion  technique  is  applied  to  rate  function 
estimation  there  is  no  asymptotic  justification  for  the  procedure. 


6.  Tests  for  the  Degree  of  the  Exponential  Polynomial  Rate  Function 
4.1  Theory 

The  analysis  of  trends  in  a NHPP,  baaed  on  the  assumption  of  an  exponential 
polynomial  rate  function,  is  discussed  in  Cox  and  Lewis  (1966,  Ch.  3),  and  Lewis 
(1972).  In  the  latter  paper,  formal  teata  for  the  linear  and  quadratic  terms  in 
the  exponent lal  polynomial  are  derlvsd.  He  use  here  a direct  extension  of  these 
methods  to  yield  testa  for  higher  degree  terms. 


There  are  a number  of  possible  hypotheses  which  esn  be  tested  when  consider- 
ing the  exponential  polynomial  rate  function 

X(t)  • exp(  l a t*)  . (6.1) 

Vo  * 

(1)  Some  given  subsets  of  the  r + 1 parameters  sre  aero.  Asymptotic  tests 
for  this  hypothesis  am  based  on  standard  maximum  likelihood  arguments; 
see  Cam  (1972)  sad  Maclean  (1974)  for  details.  Essentially  the  maximum 
valmsa  of  the  likelihood  functions  under  the  two  hypotheses  are  compared; 
the  dif femmes  hae  (asymptotically)  a x2  distribution  under  the  mull 
bypet  has  is  wick  known  dsgmes  of  fr  sedan.  The  problem  with  this  test  is 
phennwemslsglrslt  erne  soldon  ha  ewe  • p.*fterl  which  outset  to  taet. 

(2)  It  is  possible  to  ask  which  subset  of  the  r + 1 parameters  gives  the 
hast  (mast  per  a lane  lews)  fit  to  tho  dots.  This  has  baas  worked  out  for 
writs ary,  aecaal  ihsety  linear  pslyaanisl  regress ion  (Deals!  and  Wood, 
1971),  but  sat  far  tbs  WPP  case. 

(3)  dm  alternative  is  to  tost  for  successive  inclusion  of  higher  order 


13 


I 


w 


polynomial  tarns.  This  la  raasonabl*  if  tba  exponential  polynomial  la 
bslng  uaad  in  a purely  descriptive  way,  and  the  atatiatical  theory  la 
known.  Strictly,  we  teat  that,  for  some  k*l,  a^0,o^0, . . . ,^#0, 
v,^-0,ak+2“0, . . . (The  analogous  normal  time  aeries  case  la  considered 
In  t?**t  detail  in  Andaraon,  1971,  Ch.  2.)  A possible  drawback  would 
occur  where  there  is  a cyclic  effect,  t.g. 


A(t)  ■ ejtp{aQ+ k Bin  (uQt+«)}. 


(6.2) 


The  series  expansion  of  sln(uQt+e)  gives  a polynomial  with  alternating 
aero  and  non-aero  coefficients  for  powers  of  t if  the  phase  angle  is 
appropriate.  This  in  turn  is  tlad  into  the  starting  point  of  observa> 
Cions. 

He  develop  the  procedure  now  for  case  3}  we  have  used  it  in  an  ad  hoc  meaner 
by  testing  until  two  or  more  successive  aero  coefficients  occur.  For  a NHFP  with 
exponential  polynomial  rate  function 

A(t)  - exp(  l a t")  , 

mO  " ' 


the  likelihood  of  n events  in  the  period  (0,t^)  at  times  t^  K K • •*  K *,n  11 


t lQ 

t(o0,ox ar)  • HJ0  V.-|0  “»(J  •/)"*>' 


(6.3) 


where 


•• 

z 

i-1 


i • 


0 p 1 g see  $T • 


(*•4) 


The  observations  1 1^>  enter  gq.  (6.3)  only  through  (n.Itj.Jt* Jt*)f 

and  it  can  be  shown  from  the  exponential  form  of  Iq.  (6.3)  that  these  are  a set 

of  sufficient  statistics  for  the  set  of  parameters  af.  There  is, 

mere  structure  and  a formal  test  for  the  degree  term  in  the 


exgomantial  pelymomiel  rate  fwmetiom  cam  he  based  am  the  idea  that  for  any  given 

r and  af,  (s»][tj».M,Jt^)  are  a eet  of  sufficient  statistics  for 
ao»aj,»,,**®r.|/  6.e.  the  distribution  of  Jt*»  given  n.Itj.Jtjp1,  Is  independ- 


emt  of  s^, ...  ,8^| 


for  all  values  of  «f . This  is  convenient  since  we 
test  ■ 0 against  d 0 regardless  ef  the  values  of 


V,,MVr 


t to 

l.e. 


they  are  enieanee  parameters. 


t,/tn  by  u4  and  hy  ct,  a test  far 


Br  is  than  based  on 


‘ir*®  by  e4  and  B ^ *| 
the  statistic  e^  and  Its  null  hypothesis  conditional  distribution,  given 

n»Cj, » • . . 


This  diatrihwtiem  is  not 
a),  lowvvar,  asymptotically  c^tj 


1st  aamll  tQ  (equivalently  aaell 


f will 


joist ly  normally 


u 


1 


j* 


I 


distributed  with  mm  value  sad  variance  that  can  be  obtained  fron  properties  of 
the  uniform  distribution.  We  asewna  a uniform  <0, tQ J distribution  for  the 
t^'a  since  (n, Jt^, . . . ,Jt are  a set  of  sufficient  statistics  for 
o0*Ol**",0,r-i*  *°  tfut  ***vm^nt  these  parameters  co  have  value  zero  does  not 

affect  the  final  result  but  does  simplify  computations.  Then,  also  asymptoti- 
cally, the  conditional  distribution  of  cf,  given  n,Cj,..  ..c^  is  normally 
distributed  with  mean  ■ I(cy jc  ^,c  j, • • • »ci,n)  *nd  v*r*«nce  • 
VarCc^fc^j,.. . ,Cj,n)  obtainable  from  normal  theory. 

The  normal  theory  results  are  that  to  test  the  null  hypothesis  R^: 
a^-0, 01^*0,... , but  have  any  value,  compute  the  statistic 

C - u 

V • - 1 (6.5) 

r ‘r 

If 

and  teat  as  a naan  0,  variance  1 normal  deviate,  i.e.  accept  HQ  t,  say, 
a 52  level  if  J } *1.96.  Expressions  for  and  a have  been  derived  by 
techniques  of  symbolic  mathematics  and  the  matrix  operations  above.  Details  of 
the  derivation  will  be  reported  elsewhere.  The  case  r ■ 1 is  discussed  in 
detail  in  Cox  and  Lewis  (1966,  Ch.  3). 

6.2  Application*  to  High  Level  (R)  Data 

He  dloowoa  maw  the  application  a i the  parametric  rata  function  testing  scheme 
of  tba  previous  subooction  ami  the  rate  eetimetlon  procedure#  of  lection  5 to  a 
aero  microscopic  esamdnetten  of  the  treaeectloo  initiation  process  during  e 
period  of  high  system  activity  for  day  2.  This  high-activity  period  is,  in 
Figure  2,  from  approximately  t ■ 73721  to  t ■ 13661.  Ha  mill  also  us*  tba 
kernel-type  density  eetlMte  of  Section  3.2.  Ha  do  this  most  particularly 
becomes  the  MHPP  aa sumption  baa,  at  thla  paint,  not  been  validated.  Overall 
charactariat ics  of  tba  saaple  a j shown  in  Tablo  2.  (The  aampla  moosn  a given 
there  should  be  used  only  as  a guide!  ebay  art  maaniagleas  it  tha  data  la 

The  first  guest law  to  be  addressed  la  whs thee  tha  data  cam,  la  this  relative- 
ly sheet  high-act ivlty  period,  be  considered  to  be  approximately  homogeneous  or 
stationary. 

Mgpoe  3 aha ns  tha  Buanlatlma  mmahar  mf  taamsantloas  *»/ elated  iarlng  this 
time  period.  The  departure  from  linearity  is  fairly  gsaaot  aaaumiag  a homogeneous 
Poisesa  process,  the  Kolmegarov  Smirnov  measure  of  tha  departure  from  linearity 
is 

» - *£  sup  Jr  (u)  - u|  , 

* Wadi  * 

where 


(6.6) 


v»> 


number  t^’t  i ut^ 


Os  us  1. 


(6.7) 


This  is  the  uniform  conditional  teat  in  Cox  and  Lewis  (1966,  Ch.  6);  conditional 
on  tha  observed  value  K£  ■ 1999  of  evanta  in  (0,t^)  it  haa  the  uaual 
Kolnogorov-Snirnov  statistic  distribution  with  upper  IX  point  1.628;  the  obaerved 
value  ie  2.389,  which  la  an  event  of  vary  aaall  probability  under  the  Poiaaon 

msmissk- 

Theae  probabilitlaa  could  be  grossly  in  error  if  the  data  waa  more  diaper aed 
than  under  the  Poiaaon  aaaunption,  where  by  diaperaion  we  mean  either  that  the 
atandard  deviation  of  the  intervala  betweeu  evanta  or  the  counta  of  eventa  in 
long  Intervala  la  larger  then  would  be  expected  under  a Poiaaon  aaaumptlon.  (The 
two  are  not  independent.)  Theae  diaperaiona  are  uaually  Manured  by  flrat  normal- 
I ting  to  give  the  random  variable  2 mean  one;  for  intervala,  the  reault  is  the 
coefficient  of  variation,  l.e. 


C(Z) 


8(2) 


£i& 

KZ) 


To  examine  the  diaperaion  of  the  intervala  in  tha  data  without  confounding 
it  with  tha  apparent  Inhomogeneity,  the  1999  intervala  wera  divided  into  10  non- 
overlapping sect lone.  The  aample  characterlatlca  for  each  interval  are  ahown  in 
Table  3.  The  means  within  each  group  could  be  used  to  test  for  lnhomoganelty, 
but  more  importantly  the  coefficients  of  variation,  skewneas  and  kurtoala,  which 
for  exponentially  dlatrlbuted  intervals  have  values  1,  2,  and  9,  renpectively, 
give  ua  rough  measures  of  departure  which  are  sufficient  to  validate  the  teats 
for  trend. 

Table  3 gives  no  indication  that  the  sample  characteristics  of  the  intervala 
of  tha  process  depart  from  an  exponential  distribution  (although  there  my  be 
correlation  between  intervals).  The  sample  coefficients  of  variation  are  all 
around  one,  as  is  the  enable  coefficient  of  variation  for  the  whole  set  of  data 
aa  given  in  Table  2.  VO,  therefore,  proceed  to  use  techniques  baaed  on  the  NHPP 
nodal  to  anendna  tha  trend  la  norm  detail;  farther  taata  of  tha  Poiaaon  aaaunp- 
tlea  for  this  taction  of  data  are  given  in  Section  7. 


Table  4 gives  aueeeaaiee  tnett  statistic  unit 
parameters  in  the  enpnmenclal  polynomial  model 


for  the  teste  for  null 


»<«>  - ! v*}. 


srO 

This  procedure  wee  described  in  faction  6.1,  and  aa  remarked  there,  is  used 
fairly  infernally.  A fornal  application  would  suggest  stopping  at  r*2  and 


Mwr-vasg  u.oMAaiy-' 


accaptlng  a log- linear  nodel 


but  fcha  taat  statistic  for  a^, 
taste  have  baan  continued  up  to 
all  snail,  mill  within  the  3%  lijaits  of  11.96. 

Table  4 also  gives  the  values  of  the  log-likelihood  function  evaluated  at  the 
maxim*  likelihood  estimates.  The  log-likelihood  must  Increase  as  more  param- 
eters are  added;  the  difference,  when  suitably  normalised,  is  used  to  test 
(asymptotically)  for  inclusion,  or  exclusion  of  parameters  (see  Maclean,  1974  or 
Cox,  1972),  and  is  known  asymptotically  to  have  a x2  distribution.  The  abso- 
lute differences,  5,  given  in  column  three  of  Table  4 are  clearly  correlated 
with  values  of  the  test  statistic  U^,  e.g.  the  large  jump  of  13.4  when  includ- 
ing in  the  likelihood  goes  with  a large  value  of  U^. 

The  results  of  both  the  statistic  and  the  likelihood  function  valuee 
suggest  that  an  exponential  polynomial  of  degree  6 will  fit  the  data  very  well. 
The  maximum  likelihood  estimates  of  the  parameters  and  normalised  values  are 
given  in  Table  3.  In  co^iutlnf  these  estimates  in  an  APL  program  using  Maclean’s 
starting  procedure,  it  is  necessary  to  use  normalized  time  t/tQ  • u and  normal- 
ised parameters  o£  - a^t*  to  avoid  scale  problems. 

0 

The  resulting  estimated  rate  function  X(tja)  is  plotted  for  the  hlgh- 
activlty  period  in  Figure  4.  The  data  gives  an  intimation  of  a growth  plus 
cyclic  effect  of  fairly  long  period.  A model  for  this  could  be 

X(t)  • expfftjj+a^t-fOj sin  (wQt)}; 

this  is  linear  in  the  parameters  if  Uq  Is  fixed  and  known  (e.g.  time  of  day 
effect).  Moreover  if  the  Taylor  series  expansion  for  the  sine  function  is  need, 
one  has  am  exponential  polynomial  with  even  index  parasmters  (beyond  zero)  equal 
to  gero,  l.s.  *2**4*  1*  the  reason  why  ths  test  for  the  order 

ef  the  sxpeeeetial  polynomial  Indicated  that  we  should  have  stopped  st  r • 2, 
end  thee  gave  an  indication  that  wee  aen-sero.  Cyclic  effects  are  more 
easily  handled  vie  spectral  methods;  me  return  to  this  in  Section  8 

* Another  may  to  examine  the  trend  is  to  see  the  kernel-type  local  smoothing 

technique#  ef  faction  3.2.  Although  these  have  breeder  applicability  thee  the 
particular  global  fitting  under  e Mff  sc swept ion,  they  suffer  ee  In  all  non- 
perenetrlc  density  se timet lee  (spectra,  rate  functions,  probability  density 
function*,  intensity  functions) , from  the  need  to  choose  a suitable  kernel  mid 
beadrtdth.  h practice,  it  is  none  11  y reason  able  to  take  a fa*  different 


A(t)  • s 


"o+V 


Uj  ■ 5.3138  is  significantly  large,  and  the 
ra9.  For  r*  7,8,9,  the  test  statistics  ara 


I 


* 


1? 


I 


bandwidth*  and,  by  aye,  judge  whan  4 balance  between  aaall  variability  and  aaall 
bias  la  achieved. 

* A 

A kernel-type  rata  function  catlaata  A(t;n,t.,)  « nf  (t)  with  bandwidth 

1/2  ^ n 
bn  ■ 1.25/n  (choeen  in  the  above  way)  1*  ahown  In  Figure  5.  It  again  ahowa 

poaelble  oacillatory  behavior  in  the  data,  or  greater  dispersion  that  we  would 
aspect  under  a HPP  aaauuptloo.  Confidence  bands  for  this  type  of  estimate  are 
available  (Blckel  and  Rosenblatt,  1973,  Lewis  at  al.,  1975),  but  we  have  pre- 
ferred to  give,  in  Figure  6,  an  identical  swathing  of  a simulated  homogeneous 
Folseon  process  of  rate  X ■ a/tg.  Conparison  of  Figures  5 and  i graphically 
Illustrates  that  the  data  is  not  a KFP.  The  luck  of  gross  departures  from 
Poisson-type  characteristics  for  the  interval  structure  was  discussed  above; 
over  dispersion,  rather  than  a trend,  could  give  the  large  fluctuations  in  the 
rata  satinets . 

In  Figure  3 there  is  a large  peak  at  about  t ■ 3000;  we  have  examined  the 
data  for  any  obvious  anomalies  at  this  point  (e.g.  vary  regular  Intervals)  but 
have  found  none.  In  Figure  7 we  have  overlaid  the  estimated  intagrated  rata 
function  A(t;a)  (exponential  polynomial  degree  6)  on  the  empirical  estimate  of 
the  integrated  rate  function  which  la  just  the  cumulative  number  of  events  in 
(0,t)  as  a function  of  t. 

(.3  Applications  to  Low-Activity  (L)  Data 

w«  sa m give,  in  abbreviated  form,  aa  analysis  of  low-activity  (L)  data,  which 
is  similar  to  that  given  for  high-activity  (3)  data  lu  the  previous  section.  The 
lew-activity  data  is  the  period  beyond  t ■ 143152  la  Figure  2;  the  data  is  for 
a tins  period  of  approximately  1.15  times  as  long  as  for  the  high-activity  (8) 
data,  and  only  1238  events  (transaction  Initiations)  occur.  Overall  character- 
istics of  the  sample  are  shown  in  Mia  6. 

An  Immediate  observation  from  Table  i la  that  the  coefficient  of  variation 
of  the  intervals  is  high  relative  te  the  vein*  1 for  at  eipanantially  distributed 
gfgJsm  variable.  To  amamimo  this  further,  five  ■ ret  lone  of  the  data  ware  taken 
and  the  interval  eksractariatiaa  which  war#  competed  are  given  la  Table  7.  Each 
aectlaa  nf  data  coats  lead  231  oboe rvat lane.  It  is  fairly  apparent  that  tbs  means 
era  decreasing  (rat*  la  iaornaalag)  over  tbs  five  sections,  the  successive 
dlfferan ana,  an  tbs  basis  sf  the  estimated  standard  deviations  of  the  mean  sati- 
ns tea,  bmimg  about  tbroo  standard  daviattoos.  Bowovor,  all  the  coefficients  of 
variation,  eeofflelonta  of  ohewnoo*,  and  hurt os is  are  larger  thaa  the  correspond- 
ing values  for  a Poise ea  pcoeeee. 

The  firet  tannine  too  from  thr  •’em  analysis  is  that  parametric  detrending 
for  tbie  lew activity  data  nest  be  dame  with  eara;  wo  re tun*  ia  Section  7 to 


consideration  of  details  of  the  structure  of  the  lowactivity  process,  but  since 
the  intervals  are  aore  dispersed  then  for  a Poisson  process,  there  is  consistency 
with  e cluster  process  hypothesis  (Lewis,  1967,  Vera- Jones,  1970).  Note,  too, 
that  a cluster  process  will  look  aore  end  sore  like  a Poisson  process  as  activity 
increases  end  this  is  conslstert  with  the  finding  that  the  high-activity  data 
was  approximately  Poisson. 

Returning  to  the  trend  analysis,  we  show  in  Figure  8 the  cuaulative  number  of 
events  in  (0, t ] as  a function  of  t,  which  is  a non-para— trie  estimate  of  the 
integrated  rate  function  (dotted  curve).  It  is  by  no  means  linear,  and  the 
Kolmogorov-Smirnov  test  statistic  (see  Eqs.  (6.6)  and  (6.7))  has  value  6.048. 
This,  we  would  surmise,  is  significantly  large  even  if  the  Poisson  hypothesis 
were  not  true. 

In  Table  8 we  give  the  successive  test  statistics  V for  successively  sore 
complicated  exponential  polynomial  rate  functions.  There  is  a very  definite 
overall  increase  in  the  rate,  as  measured  by  - 11.696,  and  again  a phenome- 
non where  02>  and  °g  ar»  »ot  slgslf leant.  However,  it  can  also  be  seen 
that  the  tests  are  significant  out  to  r*10;  it  was  not  possible,  even  if  It 
were  desirable,  to  carry  out  the  computations  any  further.  The  maximum  log-1 ike- 
llho'xis  are  also  given  in  Table  8.  Sines  the  data  is  non-Poisson,  the  likeli- 
hoods must  be  Interpreted  vary  carefully.  It  is  conceivable  that  using  a likeli- 
hood baaed  on  a Poisson  process  would  feres  the  rate  estimation  procedure  to  fit 
the  irregularity  due  to  overdlspers ion  by  added  local  wrinkles  in  the  rate  func- 
tion. It  is,  in  fact,  always  difficult  to  discriminate  between  inhomogenelty  and 
ever-dinperslon,  bet  it  is  almost  certain  that  it  is  the  over-dispersion  which 
gives  rise  to  the  high  degree  of  the  fitted  polynomial  for  this  data. 

Uith  the  above  qualifiers  in  mind,  we  have  fitted  an  exponential  polynomial 
degree  I to  the  date.  Degree  I was  chosen  because  ef  computational  limit at ions. 
The  integrated  rate  function  A(t;o)  is  sheen  overlaid  oe  the  aou-parenetric 
estimate  in  figure  t;  the  eighth  degree  exponential  polynomial  rote  function 
i (t ;a)  with  estimated  par— stars  is  she—  la  figure  9 (—lid  curve).  Again  the 
outstanding  feature  is  the  cyclic  nature  of  the  rate,  superposed  — a generally 
ins  re—  lag  rata. 

The  berael-type  eat  Smut—  ef  the  rate  f meet  ism  is  si—  she—  in 

figure  9 (dote eg  oar—) t it  !e  ale—  is  eeuperimg  it  — the  asp— rial  poly- 
nomial rote  fen— I—  a— t— ta  that  the  p— ad—  —teg  the  Mff  amauap ttern  works 
well  deep!—  tee  apparent  depart—  from  a Man  proem—;  if  —y thing,  than 
is  a fairly  elm—  — 1 teat tea  erf  ten  r— mite  te  tel*  i that  — emp— stint  poly- 
nomial cute  f— ten  ef  degree  Mglir  Item  • Is  leaded. 


V. 


I 


It  is  a I no  of  Interest  to  note  that  the  est lasted  psrsaeters  with  even 
lades  r are  negative  (Table.  9),  a pattern  similar  to  that  for  the  high-activity 
date  shown  in  Table  5*  where  a^,  a2,  a^  sad  a^  are  negative,  the  remaining 

eat  lasted  a^'a  being  positive.  This  la  again  illustrative  of  the  cy.li.  i 

ia  the  data.  It  is  difficult  to  compare  the  aagaitude  of  the  eatlaates  If.  the 
two  periods  since,  if  there  were  a cycle  in  the  data,  the  relative  phase  at  the 
beginning  of  the  period  of  observations  would  Influence  the  parameter  values. 


6.4  Applications  to  Conpleta  Days  Data 

la  Section  2 a very  rough  saoothlng  produced  the  smoothed  estimate  of  the 
rate  of  transection  initiations  given  in  Figure  2.  It  is  of  interest  to  apply 
tht  global  saoothlng  baaed  on  a KHPT  assumption  and  an  exponential  polynomial 
rate  function  to  the  coaplete  days  data,  even  though  it  is  not  Poisson  at  low- 
activity,  so  as  to  have  a foraai,  easily  implemented  procedure  for  this  type  of 
data  which  does  not  involve  a choice  of  saoothlng  functions  and  bandwldtns. 

Over  the  whole  day  2S,076  transaction  initiations  were  observed;  details  of 
tba  testing  for  the  degree  of  the  exponential  polynomial,  and  the  values  of  the 
eat lasted  psrsaeters  are  not  tabulated  hare.  Briefly,  the  tests  up  to  r - 10, 
eacapt  for  r - 2,  indicate  that  the  parameters  are  non- zero.  Computation  of  the 
aoaaata  for  the  U^’s  only  up  to  r-10  laposes  a limited  n the  fit;  more 
I sport  oat  ly,  eat  last  loo  of  parameters  ia  an  eapoaentlal  pol  * lal  for  an  entire 
day's  data  is  aot  feasible  for  dogroo  greater  than  9.  Thus,  .n  Figure  10  we  have 
overlaid  oa  the  rate  estiaote  for  day  2 data  given  ia  Figure  2 an  exponent  lal 
polynomial  of  degree  9.  The  agreement  he  tween  the  two  estimates  is  good. 

Ha  me mid  aspect  chat  as  the  degree  ®f  the  polynomial  want  up,  the  local  fluc- 
tuations far  tha  high  aad  it  -activity  aoctioaa  weald  appear.  Tba  computational 
p rah lame,  however,  are  horrendous;  it  would  be  siapler  to  connect  up  polynomial 
sate  fumetiea  estimates  within  smaller,  contiguous  aoctioaa.  This  has  aot  bean 
pur a wad;  ia  particular,  it  is  not  clear  tha  polynomials  would  connect  saaothly. 


Tha  overall  aanelaalan  of  this  sect ion  la  that  tha  data  ia  grossly  man  homog- 
•I  pass ibis  roaaaas  will  bo  disc eased  ia  Sastlaa  A. 


7.  Taste  of  Pit  of  tha  OTP 

la  the  discussion  of  Section  A,  it  was  noted  that  by  transforming  tha  obaar- 
vnttcas  fa  a MPP  with  kaann  tats  fan  at  ton  oa  that  tba  timaa-ta-evaata  become 

T* " A<T|>»*2*  ACEj) tha  tranederaad  pgsosso  to  a bnagmMa  Putoea*  pero- 

seoa  with  wait  cats  fnaottoa.  Maaoasmr,  by  aaaitotontog  m tha  awnhar  ad  awanta 
to  (Qttg)  or  tha  prahlam  of  taattog  far  a OTP  aoa  ba  radwead  to 

taattoge  for  wean  altarnctisms,  that  tha  tlaaa  » mm  »n  are  actor  atatiatics 


Uteri*  (1966,  Ch.  6). 


from  a uniform  ■ istrlbetisn.  Other 


Can 


I 


f 


The  c raoe  format  ion  is  iImm  is  Plgure  11. 

Testing  for  a WffP  with  unknown  rata  function  is  sore  difficult.  The  sns lo- 
gout problem  in  regression  analysis  is  to  test  the  usual  assumption  that  the 
res ideals  in  an  additive  aadal 

Tt  - 9(Ut)  + ci 

are  indtyendeot  norasl  ran deal  variables  with  man  sero  and  constant  variance  a2. 

11m  problem  is  that  after  eat Inst lag  the  paramtric  nean  value  function,  the 
residuals  ■ Y^-gdjS)  are  no  longer  independent  and  normally  distributed 
(e.g.  see  Daniel  and  Wood*  1971). 

An  analogous  procedure  suggested  by  Lewis  (1970),  using  Theorem  4.1,  is  to 
estimate  the  parameters  in  the  parasntrlc  rate  function  A(t;a),  which  we  denote 
by  ft(t;a)  or  A(t),  via  maximum  likelihood  and  then  to  detrend  the  process  by 

A « As 

trensforming  the  process  to  obtain  Tj  - A(T^;a),  • /.(T?'a),...  We  would 

expect  tts  depart  eras  from  a bsaagataems  pvaeeee  to  be  Will  if  the  atssher  of 
sbaaruatiaae  is  large  and  the  aaaber  of  parameters  armll,  end,  of  course,  if  the 
coepletely  specified  WPP  is  cermet. 

Very  little  is  known  shout  this  procedure.  Rote,  however,  thst  if  the  uni- 

foam  oomditlnmel  toot  is  mod  Well  (conditions!)  Ke lasgo rev-Smirnov  statistics, 
the  pseblsm  is  that  of  Kalaagstoe  Rsrfmos  mots  of  fit  sftsr  parameter  satinet  loo. 

ULUefere  (1997.  19*9)  hoe  lawatlgf  d this  for  ameasacial  aed  normal  random 
variablss;  as  expected,  toe  estimated  distributive  function  (integrated  rets 
function)  is,  so  overage , closer  to  the  empirical  distribution  function  (empiri- 
cal integrated  sate  function)  than  without  parameter  estimation.  Here  recent 
work  on  Kolmogorov- Smirnov  teats  with  eat last  ad  parameters  is  net  yet  developed 
for  our  purpooos.  Tests  For  a howogameoua  Poiaaaa  process  based  on  spectra  (Cox 
sod  Lewis,  1999,  Ch.  9)  should  ha  leas  sensitive  to  parameter  estimation. 

Ha  saw  apply  these  nathada  ta  the  law  and  high-activity  periods  is  am  iafee- 

* 

ami  amamw.  relying  mace  am  prop  art  lea  of  tha  intervela  ami  the  covet  spectra  | 

than  am  tha  rata  fate  time.  * 

| 

7.1  U*  Activity  Beta  - beat  for  09  j 


1 


i 


I 


, f 

i 

!! 


short  intervals  than  would  occur  under  the  RtfF  assumption. 

To  proceed  with  the  analysis  of  the  detrended  high-activity  data,  in  Table  10 

we  give  results  of  several  tests  for  dependence  of  intervals  in  the  process.  The 

1/2- 

normalised,  estimated  first  serial  correlation  coefficient  (n-1)  has  a 

value  -2.3S32,  higher  than  the  IX  level  of  the  normal  distribution,  while  the 
tests  for  independence  based  on  the  cumulated  perlodograa  (raw  interval  spectral 
density  estimate)  using  the  Kolnogorov-Smlrnov  statistic  and  the  Anderson- 

Darllng  statistic  (Cox  and  Lewis,  1966,  Ch.  6)  are  just  significant  at  a 

U level. 

He  note  that  the  smoothed  Interval  spectral  density,  as  computed  in  the 
SASI-VI  program,  shows  no  characteristic  departure  from  flatness,  and  serial 
correlations  beyond  the  first  are  small.  Thus,  there  appears  to  be  only  a resid- 
ual dependence  in  the  intervals,  possibly  due  to  the  detrending  or  a residual 
trend. 

Sioilarly,  the  estimated  spectrum  of  counts  (Cox  and  Lewis,  1966,  Ch.  5; 

Lavls,  1970)  bss  no  significant  departure  from  flatness,  showing  that  a Poisson 
process  is  a tenable  hypothesis  for  the  detrended  data  and  consequently  a NHPP 
hypothesis  for  the  original  data. 

However,  some  vei*y  subtle  departures  from  exponentiallty  appear  when  one 
looks  at  the  interval  properties  of  the  detrended  process.  These  are  given  in 
Table  11.  In  the  first  place,  the  estimated  coefficient  of  variation  of  tines 

A 

between  events,  C(X')  is  smaller  than  1.  Estimated  from  five  sections  of  the 
data,  it  has  value  i(X')  ■ 0.9673,  with  estimated  standard  deviation  0.0775, 
which  is  too  large  to  give  conclusive  evidence  of  departure  from  the  value 
C(X')  ■ 1 for  a Poisson  process. 

This  artifact  of  the  data  shows  up  clearly  in  an  estimate  of  the  Intensity 

^ 

function,  nf(t).  There  is  a definite  notch  at  zero  in  the  estimate  mf(At) 

(Cox  and  Lewis,  1966,  Ch.  5).  Thus,  there  are  only  720  observations  within  A 
of  the  origin,  and  subsequently  the  estimate  ie  essentially  flat,  never  deviating 
In  any  interval  A fron  the  nodal  valua  of  1,000  by  nor#  than  50. 

Chacking  of  tha  transaction  initiation  proesas  showed  that  there  was,  in  fact, 
a nlntnun  tint  between  transaction  initiations  imposed  by  the  system.  A simple 
model  of  s Poiason  process  with  blocking  (Typo  l counter)  is  sufficient  to 
account  for  the  dev 1st loss  from  e Poisson  process . 

Another  artifact  in  tha  data  appears  in  the  fact  that  th<-  et tiaated  coeffi- 
cients of  skewneca  and  kurtosis,  y^(X')  and  ^(X*)  for  the  data  (5.2363  and 
66.3916  in  Table  11)  are  large  compered  to  the  Poisson  process  values  y^(X)  - 2, 


i 

I 


22 


Y , (X)  * 9.  These  are  due  to  occasional  very  Urge  timsa  between  transection 
initiations;  these  sees  to  occur  in  very  short  periods  of  high  variability  of 
times  between  transaction  initiations.  This  shows  up  in  Figure  5 as  the  spike 
at  about  t * 3000. 

No  explanation  has  been  found  for  this  departure  frost  the  NHPP;  it  could  be 
due  co  special  procedures  in  the  use  of  the  systea  but  in  sny  event  is  too  minor 
to  affect  practical  uae  of  the  NHPP  aodel  in  evaluating  such  s systea. 

7.2  Lev-Activity  Cats  - Test  for  NHPP 

The  low-activity  data,  after  detrending  with  an  estimstud  rate  function 

A A 

A(t;a)  which  is  the  integral  of  an  exponential  polynomial  of  degree  8,  to  give 

A A 

T’  * AfT^),  Tj  ■ A(T?) shows  s very  definite  indication  of  departure  from  a 

Poisson  process.  For  fi(X'),  y^(X’),  y^X'),  we  obtain  values  1.475.  4.1233. 

21.716,  respectively,  and  these  are  too  large  to  be  consistent  with  a Poisson 
hypothesis  after  detrending. 

The  data  also  shows  considerable  interval  correlation.  A detailed  analysis 
will  r.ot  be  given  here,  especially  since  the  detrending  process  Is  not  completely 
valid.  However,  as  remarked  earlier,  the  low-activity  data  after  detrending  is 
consistent  with  a cluster  process  hypothesis.  Ve  emphasize  that  "consistent" 
here  refers  only  to  matching  of  gross  characteristics  of  the  observed  sad  theo- 
retical processes;  there  la  no  known  formal  way  of  verifying  a non-homo geneous 
cluster  process  hypothesis. 

8.  Discussion 

The  outstanding  feature  of  this  data  it  the  oselllatery  nature  of  the  rate 
function  in  both  the  high  and  low  activity  periods.  Such  oscillatory  behavior 
is  usually  investigated  by  spectral  analysis,  but  this, of  course,  is  applicable 
only  to  stationary  data.  The  data  shows  a gross  tias-of-day  affect  superposed  on 
tbs  oscillations,  and  it  is  not  tiapla  to  flltor  this  out,  moat  particularly 
because  the  period  of  the  oscillation  is  long,  i.e.  low  frequency.  It  la,  there- 
fore, likely  to  become  mixed  up  in  a spectral  analysis  with  long  tarn  evolution- 
ary (time-of-day)  treads. 

Nevertheless,  an  sttampt  mss  made  to  examine  the  cyclic  effect  in  time 
periods  H and  L by 

(a)  detrending  (Section  T)  After  fitting  an  exponential  polynomial  of 

a 

degree  1; 

(b)  computing  the  count  spectrum  of  the  detrended  data  using  SASK-VI. 

Tbs  rsault  of  these  spectral  analysis  showed  generally  flat  spectra,  with 
paaks  at  a low  frequency  corresponding  to  a rough  gue^s  at  the  frequency  of  rhe 


cycle,  which  «u  obtained  fro*  Figures  4 end  9.  There  mat  to  be  no  evidence  of 
e fined  frequency  cycle;  thin  would  show  up  es  e chary  peek  in  the  epee true. 

The  cycles  observed  in  this  exploratory  analysis  of  a single  eerlea  of  events 
in  the  systee  bring  up  eons  interesting,  difficult,  end  as  yet,  unresolved 
aethodoloftical  and  pbenoaenological  questions. 

(1)  The  global  techniques  for  rate  function  eetinetion  need  to  be  extended 
to  larger  section*  of  data  aa  the  best  overall  way  of  looking  at  this 
data.  The  cost  practical  way  of  doing  thia  would  appear  to  be  to  apply 
the  technique  to  non-overlapping  or  overlapping  sections  of  the  date. 

The  problem  of  joining  sections  night  lead  to  (exponential)  spline 
function  techniques;  new  problems  of  testing  then  arise. 

(2)  The  question  arises  aa  to  what  causes  the  oscillatory  or  cyclic  effect; 
in  the  Introduction  we  pointed  out  that  the  transaction  initiation 
process  is  an  output  or  response  process  so  that  it  la  presunably  driven 
by  other  processes  associated  with  the  systea  (e.g.  aessage  arrivals). 
The  lapllcatlons  of  this  free  a aetbodologlcal  point  of  view  are  twofold: 

(a)  The  deterainistic  rate  function  estiaated  in  previous  sections 
Bight  be  considered,  at  least  in  the  alcro-aapecta,  to  be  purely 
descriptive.  There  is  a possibility  that  what  we  are  seeing  is  the 
effect  o'  congestion  in  the  systea  (e.g.  DL/I  coaponent),  and  the 
data  any  perhaps  be  best  described  by  soaethlng  like  a self-excit- 
ing process  (Bewkes,  1972),  which  is  the  point  process  analog  of  an 
autoregressive  systea.  This  would  not  be  inconsistent  with  our 
findings,  since  (linear)  self-exciting  processes  are  special  types 
of  cluster  processes  (lawhes  and  Oakes,  1974) . One  problea  with 
the  above  interpretation  of  the  cyclic  effect  is  that  we  would 
expect  acre  oscillatory  effect  during  high  activity  periods  than 
during  lew  activity  periods.  Bowevsr,  just  the  opposite  Is  true. 

(b)  Since  the  observed  transaction  Initiation  process  is  driven  by 
other  processes  associated  with  the  systea,  s full  description  of 
the  behavior  of  the  systea  would  involve  an  atteapr.  to  correlate 
the  transaction  Initiation  process  studied  in  thia  poser  with  pro- 
cesses at  other  points  of  the  systesu  In  particular,  it  would  be 
of  interest  to  correlate  the  transaction  initiation  process  with 
the  process  of  aessage  arrivals  frea  terainale.  It  would  also  be 
desirable  to  correlate  the  transection  Initiation  process  with  t!«* 
successive  respeaee  tines  experienced  by  users  of  the  systea. 

There  ere  aeay  aetbodologlcal  probl ea&  in  analysing  very  eon-stationary 


34 


in  particular  tba  problem  of  estimating  correlation  and/or  coherence. 
For  the  prcaant  case,  the  fact  that  the  high  activity  data  ia  close  to  Poisson, 
although  non -homogeneous,  should  make  development  of  the  necessary  methodology 
simpler.  The  work  of  Cox  and  Lewis  (1972),  and  particularly  Cox  (1972),  should 
be  useful. 


References 

Anderson,  T.  W.  (1971).  Statistical  Analysis  of  Time  Series,  Wiley,  New  York. 

Bickel,  P.  J.  and  Rosenblatt,  IW , (1973).  "On  Some  Global  Measuree  of  the  Devia- 
tions of  Density  Function  Estimates,"  Ann.  Hath.  Stat.  44,  1071-1075. 

Brown,  M.  (1972).  "Statistical  Analysis  of  Non-Homogeneous  Poisson  Processes." 

In  Stochastic  Point  Processes.  P.  A.  W.  Lewis  (ed.),  Wiley,  New  York,  67-89. 

£lnlar,  E.  (1972).  "Superposition  of  Point  Processes."  In  Stochastic  Point 
Processes , P.  A.  W.  lewis  (ed.),  Wiley,  New  York,  549-606. 

ginlar,  E.  (1975).  Introduction  to  Stochastic  Processes,  Prentice-Hall, 

Englewood  Cliffs,  New  Jersey. 

Cox,  D.  R.  (1972).  "The  Statistical  Analysis  of  Dependencies  in  Point  Processes." 
In  Stochastic  Point  Processes,  P.  A.  W.  Lewis  (ed.),  Wiley,  New  York,  55-66. 

Cox,  D.  R.  and  Lewis,  P.  A.  W.  (1966).  The  Statistical  Analysis  of  Series  of 
Events,  Methuen,  London;  Wiley,  New  York  and  Dunod,  Paris. 

Daniel,  C.  and  Wood,  P.  5.  (1971).  Fitting  Equatlona  to  Data:  Computer  Analysis 

of  Multifactor  Data  for  Scientists  and  Enslneers,  Wiley-Intarsciance,  New  York. 

Gnedenko,  B.  V.  and  Kovalenko,  I.  (7969).  Introduction  to  Queueing  Theory. 

Tr.  by  D.  Louvish,  Daniel  Devey  aud  Co.,  Hartford,  Conn. 

Hawkes,  A.  G.  (1972).  "Mutually  Exciting  Point  Processes."  Tn  Stochastic  Point 
Processes ■ P.  A*  W*  Lewis  (ad.),  Wiley,  New  York,  261-271. 

Hawkes,  A.  G.  and  Oakes,  D.  (1974).  "A  Cluster  Process  Representation  of  a 
8elf-Exclting  Process,"  J.  ApsI.  Prob.  11.  493-504. 

IBM  Corp.  (1573) . "Information  Management  System  1360,  Version  2,  General 
Information  Manual,"  GH  20-0765  IBM  Corp.,  Armonk,  New  York. 

Lavenberg,  S.  S.*  and  Shadier,  C.  S.  (1975).  "A  Queueing  Model  of  the  DL/I 
Component  of  IKS,"  IBM  Research  Report  RJ-1361,  San  Jose,  Calif. 

Lewis,  P.  A.  W.  (1965).  "Soam  Results  on  Tests  for  Poisson  Processes," 
liomttrilse  52,  67-77. 

Lewis,  P.  A.  W.  (1967).  "Ron-hommgaMOws  Branching  Poisson  Procsssos,"  J.  Rpysl 
Statist.  Sg£.  ».  'j9,  343- j’4. 


Lewis,  P.  A.  W.  (1970).  "Remarks  on  ths  Theory,  Cogputation  and  Application  of 
tha  Spectral  Analysis  yf  Ssries  of  Ivonts,”  J.  Sound  Vlb.  ^2,  353-375. 


Lewis,  P.  A.  W.  (1972).  "Recent  Results  la  the  Statistical  Analysis  of 

Univariate  Point  Processes.’'  la  Stochastic  Pelat  Precesaea.  P.  A.  W.  Lewis 
(ed.),  Wiley,  Hew  fork,  1-54. 

Lewis,  P.  A.  V.,  Catcher,  A.  N.  Hals,  A.  R.  (1969).  "SAS1-IV  an  l sp roved 
Prograa  for  the  Statistical  Analysis  of  Series  of  Sweat a,"  IBM  Research 
Report  102365,  Torhtowa  Relghts,  Row  York. 

Lewis,  P.  A.  W.,  Liu,  L.  ■.,  Robiaaou,  D.  W.  sad  Roseablatt,  M.  (1975). 

"Saplrlcal  Saapllag  Study  of  a Coodasss  of  Pit  Statistic  for  Density  Function 
Estiaation,"  Mswal  Postgraduate  School  Report  HPSS5Lw75031,  Monterey,  Calif. 

Lewis,  P.  A.  V.  sod  Robinson,  D.  V.  (197A).  "Testing  for  a Moootone  Trend  in 
a Modulated  Isa seal  Process. " In  Reliability  and  Monster.  F.  Pros chan  (ed.), 
SIAM.  Philadelphia,  1*3-111. 

Lewis,  P.  A.  V.  and  Shadier,  C.  8.  (1973).  "Satirically  Derived  Mlcroaodels  for 
Sequences  of  Pegs  Inceptions,"  IBM  J.  Res.  Devel.  17,  66-100. 

Lilliefore,  R.  W.  (1967).  "On  the  Eolnogorov-Suiraov  Test  for  Norasllty  with 
Mean  and  Variance  Unknown,"  J.  her.  Statist.  Assoc.  62,  399-402. 

Lilliefore,  I.  V.  (1969).  "Ob  the  Solas  go  row  Salraov  Teat  for  the  Snponentlal 
Distribution  with  M.iaa  Jfcknown,"  J.  A«»r,  Statist.  Aeeoc.  £4»  3S7-3S9. 

MacLsan,  C.  J.  (1974).  "Estiaation  and  Testing  ef  an  Inpenential  Polyaoaial  Rate 
Function  Within  the  Ron-Stationary  Poisson  Process,"  liens triha  £1,  S1-S6. 

Rosenblatt,  M.  (1956).  "Reasrka  on  Sons  Hoa-Psraaetric  1st ins tee  of  a Density 
Function,"  Ann.  Math.  Stst.  17,  3,  S32-S37. 

Fere- Jones,  D.  (1970).  "Stochastic  Medals  for  larthquaba  Occurrence,"  J.  Royal 
Statist.  Sec.  1,  &,  1-62. 


n 


1 

k 


TABLE  2 

Sample  Characteristics  of  Tiaes*Between~Ev*ats. 
Transaction  Initiation  Process  for  Tia«  Period  E. 


n nomber  of  transactions  initiated  1999 


period  of  observation  11936.6066 


X estimated  mean  time  betveen  trans-  5.9698 

action  initiations 


6(X)  estimated  coefficient  of  variation 

of  times  betveen  transaction  initiations  1.0533 


y.(X)  estimated  coefficient  of  skewness  of  times 

between  transaction  initiations  6.7399 


Y-(X)  estimated  coefficient  of  kurtosis  of 

times  between  transaction  Initiations  107.7282 


X neximum  time  betveen  transaction 

*“*  initiations  133.6488 


X minimum  time  between  transaction 

initiations  0.0152 


TA* US  3 


Soaplo  Char*st*ri*tie«  of  Tlaoa-lotwcan-Evonts . 
Troaoactioa  Initiation  Proc*»e  for  Ton  Soctioa* 
of  Ti»  £ar£od  K. 


— 

Soetlon 

map  1c 

soon 

r 

H9 

BH 

cooff.  of 
V«fi«tiOQ 

ta) 

~o*£f.  of 
eksvnese 

\m 

co*ff.  of 
kurtotls 
?2(X) 

1 

H 

0.6430 

2.3096 

11.4548 

2 

6.0564 

K^TTKp;- 

0.8328 

2.2653 

12.1494 

3 

5.4878 

■ 

■iftjwB 

1.3916 

7.7614 

84.6585 

4 

6.1348 

0.8789 

1.1901 

3.9449 

3 

5.0611 

1 

0.7954 

2.9991 

18.6264 

M. 

w 

OtOUJ 

W.WJ 

0.7/06 

2. 265a 

7.09// 

m 

/ 

7.5952 

0.7779 

1.4446 

8.0075 

89.8598 

6 

6.2456 

0.3631 

9.8652 

1.8087 

7.11/4 

8 

4.2847 

0.2425 

1.6654 

6.6807 

40 

4.5566 

0.2313 

0.7750 

1.7533 

8.1512 

.■  , • - 

5.9706 

0.4137 

0.9583 

3.2026 

25.2341 

©.d.  mm 

0.3591 

0.87S3 

0.7952 

10.4197 

r 


? 


TAILS  4 

Values  of  Maximum  Log-llkalihood  and  Test  Statistic  in  NHPP 
Exponential  Polynomial  Rate  Function  for  Times  Between 
Transaction  Initiations  for  Time  Period  H. 


maximum 
log-likelihood 
max  log.  L 

- ■ - --  - - \ 

absolute 

difference 

6 

1 

test 

statistic 

U 

r 

-5563.8 

3.8387 

-5562.8 

1.0  ■ 

1.5727 

-5549.4 

13.4 

5.3138 

-5548.9 

0.5 

-0.4437 

-5539.9 

10.0 

-4.2081 

-5537.0 

2.9 

-2.6188 

-5536.9 

0.1 

0.0188 

-5534.8 

0.1 

0.1211 

-5536.8 


0.0 


0.2038 


XASLS  6 

itaaple  Characteristics  ef  Tiaas-Setwaen-Events. 
Transaction  Initiation  Process  far  Tine  Period  L. 


n 

noaber  of  transactions  initiated 

period  of  observation 

13819. 51927 

X 

•et lasted  aean  tiac  between  trans- 
ection initiatione 

10.9809 

too 

estimated  coefficient  of  variation  of 
tiaes  between  transaction  initiatione 

1.6563  ; 

^(S) 

estimated  coefficient  of  skewness  of 
tiaes  between  transaction  initiations 

3.7524 

?2«> 

estimated  coefficient  of  kurtosle  of 
tiaes  between  transection  initiations 

18.9686 

X 

KM 

Initiations 

145.4241 

Sain 

■In latra  time  between  transection 
initiations 

1 I 

0.0263 

ton  1 

Cbameeri«tks  of 

Transaction  Initiation  h:«eus  for  Flos  Soc&los* 
of  Tin*  Feriod  L. 


s.d.  of 

ctm’itc  of 

•action 

' ■ ? ' j ' 

variatim 

C(X> 

X 

oOL) 

1 

18.4683 

1.6760 

1.4378 

2 

12.5333 

1.2289 

1.5534 

3 

9.2178 

0.9318 

1.6015 

4 

8.2978 

0.9430 

1.8005 

5 

6.2124 

0.3806 

0.9706 

snaa 

10.9459 

1.0321 

1.4728 

e.d.  ossa 

2.1390 

0.2116 

0.1386 

YjCX) 


2.25 
3.5112 
5.0123 
4.2290 
3.7494 

3.7519 
G.4532 


co«ft.  of 
fcurtotit 

^ — v 

y2W 


16.6713 

32.5160 

23.2152 

23.7669 

20.8242 

4.0867 


TABLE  10 

Tests  for  Dependence  on  Serial  Number  and  Dependence 
Between  Intervals,  Detrended  (NHP?  Exponential  Polynomial 
Rate  Function  of  Degree  6)  Transaction  Initiation 
Process  for  Time  Period  H. 


n number  of  transactions  initiated 


estimated  serial  correlation  coefficient 
of  lag  1 for  times  between  transaction 
initiations 


1999 


-0.0576 

-2.5532 


Tests  for  serial  independence  based  on 
cumulated  periodogtam 

Da/2  Kolsaogorov-Sairnov  statistic  1.4897 


2 ** 
Anderson-Darling  statistic 


3.9941 


upper  1%  point  is  1,518 

hit 


upper  135  point  is  3,857 


TABLE  11 


Sasple  Characteristics  of  Tine#-8stveen-Events. 
Detrended  (BtBtf  Exponential  Polynomial  Rate  Function 
of  Degree  6)  Transaction  Initiation  Process  for  Time 


n 

snnfcer  of  transactions  initiated 

1999 

s 

period  of  observation 

1999.02 

X’ 

estimated  mean  time  between  trans- 
ection initiations 

0.9998 

ta *) 

estimated  coefficient  of  variation 
of  times  between  transaction  initiations 

0.9784 

i 

v*'> 

estimated  coefficient  of  sfcewnpsa  of  times 
between  transaction  initiation!. 

5.2363 

$,on 

Sb 

estimated  coefficient  of  fcurtosla  of  times 
between  transaction  initiations 

68.3916 

r*sx 

■ax; imam  tins  between  transaction 

initiations 

17.4752 

X* 

«ln 

minimum  time  between  transaction 
initiations 

0.0031 

T«rmin*l$ 


Figure  1.  IMS  system  configuration.  Conceptual  diagram  of  a computer 
system  running  IMS. 


Figure  2.  Estimated  mean  number  of  transactions  initiated  in  a unit  time  interval 
for  days  1, 2 and  3.  Estimates  obtained  by  averaging  counts  in  4800  adjacent  unit 
time  intervals.  This  very  severe  smoothing  takes  out  focal  fluctuations  but  gives  a 
jsfcture  of  how  die  activity  varies  over  a full  day. 


250 


Figure  3.  Cumulative  number  of  transactions  initiated  for  time  period  H (high 
activity).  There  is  enough  departure  from  linearity  to  suggest  inhomogeneity  in 
the  date.  The  test  for  a homogeneous  Poisson  process  using  the  Kolmogorov- 
Smirnov  statistic  confirms  the  departure;  the  value  2.389  of  the  Kolmogorov- 
Smirnov  statistic  is  highly  significant. 


*1  f 
;# ! 


0.12 


0 2000  4000  6000  8000  10000  12000 

t 


Figure  4.  Estimate  Aft;  a)  of  NHPP  rate  function  using  exponential  polynomial 
(degree  6)  of  transaction  initiation  process  for  time  period  H (high  activity. 


250 


U i l L_1 L. 

2000  4000  6000  8000  II 

t 


12000 


Figure  7.  Parametric  end  empirical  estimates  of  the  integrated  rata  function 
for  time  period  H (high  •etivity).  Solid  curve  is  the  NHPP  estimate  A(t;  a) 
exponential  polynomial  (degree  6),  Dotted  curve  is  the  cumulative 
number  of  events  for  time  period  H. 


t 


Hour*  8.  Parametric  and  empirical  estimate*  of  the  integrated  rare  function 
for  time  period  L (low  activity).  Solid  curve  it  the  NHPP  estimate  A(t;  a) 
using  exponential  polynomial  (degree  8).  Dotted  curve  is  the  cumulative 
number  of  events  for  time  period  l~ 


Figure  0.  Estimates  of  the  rats  function  for  tint*  period  L (low  activity). 
Solid  curve  is  the  WHPP  estimate  1t{t;a}  using  exponential  polynomial 
(degree  8).  Dotted  curve  is  the  estimate  X (t;  n.  u)  using  a kernel-type 
density  estimator.  Sample  size  n » t258,  band^-wfcfth  b(n)  ■ 1.25/n1/2. 


Estimated  Rat*  F 


f^PW  VH  Cflllf^W  Of  irn  n^  fURCIIOff  •»  WpMKPCWI  WMiMIMI  F«IOT 

for  day  2.  Soiki  curve  b a gfobeiestimew  baaed  or.  an  expofmrtiai  poly- 
nomial of  degree  9.  Oottad curve  it focal estimate  obtained Min  Figure 2. 
The  high  activity  (K)  and  low  activity  (L)  time  periods  arc  marked  on  the 
figure.  ' 


INITIAL  a l STRI BUflON  Tl ST 


Dean  of  Research 
Code  012 

Naval  Postgraduate  School 
Monterey, 'CA  9J940 

Library,  Code  0212 
&aval  Postgraduate  School 
Monterey , CA  93940 

Library,  Code  55 

Naval  Postgraduate  School 

Monterey,  CA  93940 


Copies  Copjob 

2 Director  * 

• Office  of  Naval  Research 
Branch  Office 
1030  East  Green  Street 
Attn:  Dr.  A.  Laufer 

2 Pasadena,  CA  91101 

Office  of  Naval  ".esearch  1 

Branch  Office 

2 1030  Easr  Green  Street 

Attn:  Dr.  Richard  Lau 

Pasadena,  CA  91101 


P.  A.  W.  Lewis,  Code  55Lv 
Naval  Postgraduate  School 
Monterey,  CA  93940 

Professors  G.  G.  Brown,  Code  55Bw 
S.  m.  Sutterworth,  55Bd 
J.  0.  Ei  ry,  55Ey 
D.  P.  Gaver.  55Gv 
•I.  T.  Harsh  tl,  55Mt 
P.  «.  tfUoJi,  55Mb  , 

P.  R.  Richards,  55Rh 
Naval  Postgraduste  Set  el 
Mooter sy,  CA  939*0 

Statistics  and  Probability  Progran 
Office  of  Naval  Research 
Attn:  Dr.  1,  J.  McDonald 

Arlington,  VA  22217 

Defense  Documentation  Center 
Cans ran  Station 
Alexandria,  VA  22314 

Technical  Information  Division 
Naval  Research  Laboratory 
Washington,  D.  C.  29390 

Office  of  Keve'>  Research 
New  York  Area  Office 
715  Broadway 

Attn:  Dr.  Cohort  Graf ten 

New  York,  Nsw  York  10003 

Director 

Office  of  Research 

Branch  Office 
493  S&ssar  Street 
Attn:  Dr.  A.  L.  Pea^H 

Vjjtit  «sfi#  f?/  v 0221© 


10  Office  of  Naval  Research  1 

San  Francisco  Area  Office 
760  Market  Street 
San  Francisco,  CA  94102 

1 

1 Technical  Library  1 

1 Naval  Ordnance  Station 
1 Indian  Head,  MD.  20640 

1 

1 Office  J>f  Naval  Research  1 

‘ 1 Branch  Office 

1030  East  Green  Street 
Attn:  Dr.  D.  Osteyee 

Pasadena,  CA  91101 

3 

Naval  Ship  Engineering  Center  J 

Philadelphia 

Division  Technical  Library 
Philadelphia,  PA  19112 

12 

Bureau  of  Naval  Personnel  I 

Departaent  of  the  Navy 
Technical  Library 
6 Washington,  D.  C.  20370 

Dir  set or.  Naval  Research  6 

Laboratory 

1 Attn:  Librery,  Code  2029 

<om) 

Washington,  D,  C,  20390 

Director  1 

Office  of  Naval  Research  Branch 
1 Office 

536  South  Clark  Street 
Attn:  Dr.  A.  8.  D3we 

Chicago,  IL  61605 


Copies 

Library  1 

Naval  Electronics  Laboratory 
Center 

San  Diego,  CA  921S2 

Naval  Undersea  Center  l 

Technical  Library 

San  Diego,  CA  92132 

Applied  Mathematics  Laboratory  1 

Naval  Ship  Research  and  Development 
Center 

Attn;  Mr.  Gene  H.  Glelssner 
Washington,  D.  C.  20007 


